Score Region Algebra: A Framework for Structured IR
نویسنده
چکیده
We address the problem of developing a flexible framework for information retrieval (IR) in structured documents, such as XML. The framework is able to support a wide range of structured IR queries, transparent instantiations of different retrieval models, and different physical implementations. It is based on so-called score region algebra (SRA) that can express the following four essential ranked retrieval aspects for structured IR: term and element selection, element relevance score computation, element score propagation, and element score combination. Our preliminary research shows that different instantiations of each aspect, as well as different combinations of these instantiations, yield significantly different results. Our goal is to better understand structured IR by studying these aspects alone and their combination in the framework of SRA, and to use this knowledge to improve our structured IR system.
منابع مشابه
Hiemstra Peter
A unified database framework that will enable better comprehension of ranked XML retrieval is still a challenge in the XML database field. We propose a logical algebra, named score region algebra, that enables transparent specification of information retrieval (IR) models for XML databases. The transparency is achieved by a possibility to instantiate various retrieval models, using abstract sco...
متن کاملScore region algebra : a flexible framework for structured information retrieval
operators ⊗ and ⊕ are implemented as product and sum. Also, additional operator @p is defined similar to Ap, except that it returns regions from the left operand contained in regions from the right operand or regions from the left operand that have the same region bounds as regions in the right operand. The score attribute is defined through the f@(r1, R2) function defined in Equation 3.13.
متن کاملExploiting Query Structure and Document Structure to Improve Document Retrieval Effectiveness
In this paper we present a systematic analysis of document retrieval using unstructured and structured queries within the score region algebra (SRA) structured retrieval framework. The behavior of different retrieval models, namely Boolean, tf.idf, GPX, language models, and Okapi, is tested using the transparent SRA framework in our three-level structured retrieval system called TIJAH. The retr...
متن کاملOptimizing XML Information Retrieval Query Execution at the Physical Level
XML is emerging as a standard format for information interchange and storage of structured information. The wide-spread use of XML has sparked the interest of both the database and information retrieval research communities. XML databases are designed to store and query large volumes of XML data. Structured information retrieval or XML-IR is the application of information retrieval concepts and...
متن کاملUtilizing Structural Knowledge for Information Retrieval in XML Databases
In this paper we address the problem of immediate translation of eXtensible Mark-upLanguage (XML) information retrieval (IR) queries to relational database expressions andstress the benefits of using an intermediate XML-specific algebra over relational algebra. Weshow how adding an XML-specific algebra at the logical level of a DBMS enables a level ofabstraction from both query ...
متن کامل